This invention relates generally to detecting website cloaking by generating semantic representations of content at mobile devices, and comparing the semantic representations obtained from the mobile devices with semantic representations of content generated at an online system.
Online systems often have policies regarding what content can be posted to the online system and what content can be linked from content distributed by the online system. For example, an online social networking system may restrict users from posting and linking to certain types of content, such as adult content, violent content, threats, content related to criminal activity, or fraudulent content. To enforce these policies, the online social networking system monitors content and blocks content that is determined to be in violation of a policy. To thwart the online system's ability to detect linked content that violates a policy, certain websites perform cloaking of the content they publish to the online system.
Websites perform cloaking by providing different content to different users. For example, a website may identify a user that is requesting content from the website, or identify information describing the device, such as the device's IP address. The website then provides “good” content to devices that are determined to be within an online system that enforces a policy, such as devices used for monitoring and maintaining an online system, for example, a social networking system. The website provides “bad” content (e.g., content that is in violation of a policy) to other devices, such as devices that are used by users of the social networking system and that are identified as being external the online system. The good content shown to devices within the online system “cloaks” the content that is shown to external devices, making it difficult for the online system to know the true nature of the content that the website is delivering to the external users of the online system. Conventional techniques fail to detect policy violations by websites that perform cloaking.